Scalable Data Clustering: A Sammon's Projection Based Technique for Merging GSOMs

نویسندگان

  • Hiran Ganegedara
  • Damminda Alahakoon
چکیده

Self-Organizing Map and Growing Self-Organizing Map are widely used techniques for exploratory data analysis. The key desirable features of these techniques are applicability to real world data sets and the ability to visualize high dimensional data in low dimensional output space. One of the core problems of using SOM/GSOM based techniques on large datasets is the high processing time requirement. A possible solution is the generation of multiple maps for subsets of data where the subsets consist of the entire dataset. However the advantage of topographic organization of a single map is lost in the above process. This paper proposes a new technique where Sammon’s projection is used to merge an array of GSOMs generated on subsets of a large dataset. We demonstrate that the accuracy of clustering is preserved after the merging process. This technique utilizes the advantages of parallel computing resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Merging Similarity and Trust Based Social Networks to Enhance the Accuracy of Trust-Aware Recommender Systems

In recent years, collaborative filtering (CF) methods are important and widely accepted techniques are available for recommender systems. One of these techniques is user based that produces useful recommendations based on the similarity by the ratings of likeminded users. However, these systems suffer from several inherent shortcomings such as data sparsity and cold start problems. With the dev...

متن کامل

Retraining the Neural Network for Data Visualization

In this paper, we discuss the visuaHzation of multidimensional data. A well-known procedure for mapping data from a high-dimensional space onto a lower-dimensional one is Sammon's mapping. The algorithm is oriented to minimize the projection error. We investigate an unsupervised backpropagation algorithm to train a multilayer feed-forward neural network (SAMANN) to perform the Sammon's nonlinea...

متن کامل

خوشه‌بندی داده‌ها بر پایه شناسایی کلید

Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011